Create a Field Replacement Visitor for replacing fields in ranges#3211
Create a Field Replacement Visitor for replacing fields in ranges#3211awrigh11 wants to merge 5 commits intointegrationfrom
Conversation
| } | ||
|
|
||
| @Test | ||
| public void multiRuleReplacementTest() throws Exception { |
| // need to fetch field to datatype map first | ||
| timedFetchDatatypes(timers, "Fetch Required Datatypes", config.getQueryTree(), config); | ||
|
|
||
| if (!disableFieldReplacementRules) { |
There was a problem hiding this comment.
perhaps a comment here for our future selves about how this must be called before expand multi normalized terms, so no one accidentally refactors this in the wrong order
There was a problem hiding this comment.
That would be extremely helpful
|
|
||
| import datawave.query.jexl.JexlASTHelper; | ||
|
|
||
| public class DirectFieldReplacementRule implements FieldReplacementRule { |
There was a problem hiding this comment.
Add class level documentation for what the expected behavior is, what types of nodes are affected, etc
| private String field = null; | ||
| private String replacement = null; |
|
|
||
| import org.apache.commons.jexl3.parser.JexlNode; | ||
|
|
||
| public interface FieldReplacementRule { |
There was a problem hiding this comment.
Add brief class level doc
| import datawave.query.jexl.nodes.QueryPropertyMarker; | ||
| import datawave.query.jexl.visitors.RebuildingVisitor; | ||
|
|
||
| public class RangeFieldReplacementRule implements FieldReplacementRule { |
There was a problem hiding this comment.
Visitor name should be updated to include "BoundedRange", range on it's own could be any range operator (LT, LE, GT, GE).
The direct field replacement rule actually replaces the field. This visitor doesn't actually replace the bounded range, consider an alternate name.
|
|
||
| // Create a copy and mark it as "evaluationOnly" | ||
| JexlNode copy = RebuildingVisitor.copy(node); | ||
| JexlNode evalNode = QueryPropertyMarker.create(copy, EVALUATION_ONLY); |
There was a problem hiding this comment.
Why mark the original as evaluation only? Are the replacement fields always indexed?
If you had a case where the original field is indexed and the replacement field is not indexed then the executability of the query was just destroyed, in cases where the bounded range is an anchor term.
| public class FieldReplacementVisitorTest { | ||
| private static final Logger log = Logger.getLogger(FieldReplacementVisitorTest.class); | ||
| private static final DirectFieldReplacementRule dfrRule = new DirectFieldReplacementRule("ABC", "XYZ"); | ||
| private static final Map<String,String> rangeMap = Map.of("AA", "BB", "CC", "DD"); |
There was a problem hiding this comment.
I like the convention you setup in the direct field replacement rule where ABC -> XYZ. You might consider doing the same with the range map, something like
AA -> XX
BB -> YY
CC -> ZZ
| TreeEqualityVisitor.Comparison comparison = TreeEqualityVisitor.checkEquality(expectedScript, actualScript); | ||
| if (!comparison.isEqual()) { | ||
| log.error("Expected " + PrintingVisitor.formattedQueryString(expectedScript)); | ||
| log.error("Actual " + PrintingVisitor.formattedQueryString(actualScript)); | ||
| } | ||
| assertTrue(comparison.getReason(), comparison.isEqual()); |
There was a problem hiding this comment.
+1. This is the way to use the TreeEqualityVisitor. Excellent work.
| // Verify the script is as expected, and has a valid lineage. | ||
| assertScriptEquality(resultScript, expected); | ||
| assertLineage(resultScript); | ||
|
|
||
| // Verify the original script was not modified, and still has a valid lineage. | ||
| assertScriptEquality(originalScript, original); | ||
| assertLineage(originalScript); |
There was a problem hiding this comment.
Asserting script equality AND lineage? Love it.
| import datawave.query.planner.replacement.rules.FieldReplacementRule; | ||
| import datawave.query.planner.replacement.rules.RangeFieldReplacementRule; | ||
|
|
||
| public class FieldReplacementVisitorTest { |
There was a problem hiding this comment.
These tests are a good start. Given that the visitor supports replacing both single fields and bounded ranges it would be prudent to cover the following cases
- single term
- single term in union
- single term in intersection
- bounded range (covered)
- bounded range in union (covered)
- bounded range in intersection
It would be nice to see a field replacement rule run against each of the basic node types (EQ, NE, ER, NR, LT, GT, LE, GE) just to document what the expected behavior is.
Based on reading the code there is a bug where the bounded range rules will not be applied to an intersection of bounded ranges. Fixing that bug could lead to another bug where a direct field replacement rule is accidentally run on a bounded range. So there should be a test case added that looks like this:
direct field rule AA -> XX
(bounded range for AA)
```
| // @formatter:off | ||
| String query = "((_Bounded_ = true) && (CC >= '2' && CC <= '4')) || ((_Bounded_ = true) && (AA >= '2' && AA <= '4'))"; | ||
| String expected = "(((_Eval_ = true) && ((_Bounded_ = true) && (CC >= '2' && CC <= '4'))) && " + | ||
| "((_Bounded_ = true) && (DD >= '2' && DD <= '4'))) || " + | ||
| "(((_Eval_ = true) && ((_Bounded_ = true) && (AA >= '2' && AA <= '4'))) && " + | ||
| "((_Bounded_ = true) && (BB >= '2' && BB <= '4')))" ; | ||
| // @formatter:on |
There was a problem hiding this comment.
I really like the way this is broken out. Makes the test easy to read.
No description provided.